Search Results: "benjamin"

18 May 2014

Benjamin Mako Hill: Installing GNU/Linux on a 2014 Lenovo Thinkpad X1 Carbon

I recently bought a new Lenovo X1 Carbon. It is the new second-generation, type 20A7 laptop, based on Intel s Haswell microarchiteture with the adaptive keyboard. It is the version released in 2014. I also ordered the Thinkpad OneLink Dock which I have returned for the OneLink Pro Dock which I have not yet received. The system is still very new, challenging, and different, but seems to support GNU/Linux reasonably well if you are willing to run a bleeding edge version and/or patch your kernel and if you are not afraid to spend an afternoon or two tweaking things. What follows are my installation notes for Debian testing (jessie) when I installed it in early May 2014. My general impressions about the laptop as a GNU/Linux system and overall are at the end of this write-up.
System Description The X1 Carbon I ordered included the 512GB SSD, the 14.0 inch WQHD (2560 1440) 260 nit touchscreen, and the maximum 8GB of memory. I believe the rest is not particularly negotiable but includes a 720p HD Camera, a 45.2Wh battery, and an Intel Dual Band Wireless 7260AC with Bluetooth 4.0. For those that are curious Here is the output of lspci on the system:
00:00.0 Host bridge: Intel Corporation Haswell-ULT DRAM Controller (rev 0b)
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b)
00:03.0 Audio device: Intel Corporation Haswell-ULT HD Audio Controller (rev 0b)
00:14.0 USB controller: Intel Corporation Lynx Point-LP USB xHCI HC (rev 04)
00:16.0 Communication controller: Intel Corporation Lynx Point-LP HECI #0 (rev 04)
00:16.3 Serial controller: Intel Corporation Lynx Point-LP HECI KT (rev 04)
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I218-LM (rev 04)
00:1b.0 Audio device: Intel Corporation Lynx Point-LP HD Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation Lynx Point-LP PCI Express Root Port 6 (rev e4)
00:1c.1 PCI bridge: Intel Corporation Lynx Point-LP PCI Express Root Port 3 (rev e4)
00:1d.0 USB controller: Intel Corporation Lynx Point-LP USB EHCI #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation Lynx Point-LP LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation Lynx Point-LP SATA Controller 1 [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation Lynx Point-LP SMBus Controller (rev 04)
BIOS/Firmware The BIOS firmware is non-free and proprietary as it the case with all ThinkPads and nearly all laptops. According to this thread there is a bug in the default BIOS that means that suspend to RAM is broken in GNU/Linux. You can get updated BIOS at the Lenovo s ThinkPad X1 Carbon (Type 20A7, 20A8) Drivers and software page by looking in the the BIOS section. Honestly, the easiest approach is probably to download the Windows BIOS Update utility (documentation is here) which you can use to run the BIOS update from within Windows before you install GNU/Linux. If that s not an option (e.g., if you ve already installed GNU/Linux) the best method is to download the bootable CD ISO from the same page. Of course, since the X1 Carbon has no optical media, you have to find another way to boot the CD image. I struggled to get the ISO to boot from USB using the usually reliable dd method. This message suggest that the issue had to do with the El Torito wrapper:
I had to dump the eltorito image from the ISO they provide, after that I was able to dd the resulting image to a flash drive and the bios update went well, no cdrom needed.
I updated to version 1.13 of the BIOS which fixes the suspend/resume bug. By the time you read this, there may be newer versions that fix other things so check the Lenovo website.
Installing Debian I installed Debian testing using the March 19, 2014 Alpha 1 release of the Debian Installer for Jessie (currently testing). I installed in graphical mode. With the WQHD screen, everything was extremely tiny but it worked flawlessly. I downloaded the amd64 net install image from the normal place and installed the rest of the system using the built-in Ethernet port which required no firmware or extra drivers. I did the normal dd if=FILENAME.iso of=/dev/sdX method of getting the installer onto the a USB stick to boot. I turned off restricted boot in BIOS first. In general, the latest version of the Debian installation guide is always a good source of guidance on installing Debian. I used the Debian installer wizard to partition and selected Use entire disk and partition it for LVM and encrypted data which kept the UEFI partitions around. The system installed with no errors or issues and booted up normally afterward. The grub menu is hilariously narrow on the WQHD screen. If you want to use the built-in wireless and/or Bluetooth, you will need to install the non-free iwlwifi firmware package. It is very lame that we still have to do this to use hardware we have purchased.
What Works and Doesn t The following stuff works the first time I booted into the GNOME 3 desktop and logged in:
  • The WQHD 2560 1440 screen
  • The touchscreen
  • Both the TrackPoint and the touchpad
  • Built-in e1000e Ethernet using the dongle
  • The keyboard plus the adaptive row of F1-F12 keys.
  • External monitor using the full HDMI or mini-DisplayPort connectors
  • Audio (both speakers and microphone)
  • The camera/webcam
The following stuff works if you install non-free firmware:
  • Internal Wireless
  • Bluetooth 4.0
The following stuff works with qualifications:
  • Suspend to RAM Works once you have updated the firmware.
  • The adaptive keyboard The F1-F12 keys work but the button that theoretically lets you switch to different sets of function buttons (e.g., volume, brightness) does nothing.
  • Disabling the touchpad There is a BIOS option to disable the touchpad. It works in Windows and does nothing at all in GNU/Linux.
I have not tried:
  • The fingerprint reader
Disabling the touchpad As a long-term ThinkPad user, I love the TrackPoint pointing stick. If you plan on using this, the built-in touchpad is incredibly aggravating because it is very easy to brush against it while using the TrackPoint. In BIOS, there is an option to disable the touchpad. Although this works in Windows, it does absolutely nothing in GNU/Linux. Part of the issue is that, unlike the older X1 Carbon and other ThinkPads, there are no TrackPoint buttons. Instead of buttons, there are regions at the top of the touchpad which are configured, in software, to act like buttons. If you want to be able to click, the touchpad can never be truly turned off. This is not problem unique to the Haswell X1 Carbon and a number of people have been struggling with this issue on other Lenovo laptops. Essentially, what you need to do is configure your touchpad so that the buttons are where you want them and so that it ignores any input for the purposes of cursor movement. There are a few ways of doing this but this answer from an askubuntu.com question has the solution I ended up using:

Open file /usr/share/X11/xorg.conf.d/50-synaptics.conf for edit.

Find Section InputClass which the following line is Identifier Default clickpad buttons .

Edit option for SoftButtonAreas to values 64% 0 1 42% 36% 64% 1 42%, this is size of the right and middle button.

Enable option AreaBottomEdge and change value to 1, this will disable touchpad movement.

If everything done right, your class should looks like:

Section "InputClass"
     Identifier "Default clickpad buttons"
     MatchDriver "synaptics"
     Option "SoftButtonAreas" "64% 0 1 42% 36% 64% 1 42%"
     Option "AreaBottomEdge" "1"
EndSection
Essentially, the first Option line will create a middle button that is 32% of the width and 42% of the height, and a right button that is 32% of the width and 42% of the height. The synaptics manpage (man synpatics) will give you more detail on the general way this works. Of course, something does feel very wrong about editing a file in /usr/share.
Fixing the Adaptive Keyboard The most wild feature of the laptop is the adaptive keyboard strip. The strip is a back-lit LCD that looks almost like E Ink screen and acts as a touchscreen keyboard. The default mode gives you the F1-F12 keys. If you press the keys (since they aren t buttons, you just put your finger on top of them) they act like normal F-keys. You can Ctrl-Alt-F1, etc., to switch to virtual terminals out of the box. There are four modes: Function (i.e., normal F-keys), Home, Web, and Chat. The last three overlap quite a bit (e.g., they all have brightness and volume). You can play with an example on the Lenovo homepage. In Windows, switching programs will apparently change these keys so that an appropriate set of buttons is shown for the application you are using. You can also change these keys manually with a big Fn button at the far left of the adaptive keyboard strip. As I write this this, released kernels do not support the adaptive keyboard Fn button which means you cannot use anything other than the F-keys out of the box. I believe it also means that resuming from suspend to RAM breaks these keys. That said, Shuduo Sang from Canonical has released several versions of a patch to to the thinkpad_acpi kernel module which adds support for the Home mode. The other modes (web and chat) do not seem to be supported. The latest version of the patch is on on the Linux Kernel Mailing List and the relevant commits are:
330947b save and restore adaptive keyboard mode for suspend and,resume
3a9d20b support Thinkpad X1 Carbon 2nd generation's adaptive keyboard
Although this is not supported in Debian testing at the time of writing, a bug was filed in Debian and quickly fixed by Ben Hutchings in Debian kernel version 3.14.2-1 which is currently in sid/unstable. As a result, if you install the latest version kernel from Debian unstable (3.14.2-1 or later), the adaptive keyboard just works. If you aren t using Debian and if kernel you are using does not have support, you might be patching your kernel.
General Impressions As I have described in my interview with The Setup, I have been a user of ThinkPad X-series laptops for many years. This is my sixth X-series ThinkPad. Overall, I quite like the hardware! Once things mature a little bit, I think that this will be a great laptop for running GNU/Linux. That said, I ordered the laptop without realizing that the X1 Carbon had gone through a major revision! The keyboard was quite a suprise. I think that changing a system so radically without changing the model name/number is a very bad move on Lenovo s part. There are two remaining issues with the system I m still struggling with: (1) the keyboard layout is freaky and weird, and (2) the super high resolution screen breaks many things. The quality of the keyboard itself is great and worthy of the ThinkPad name. That said, there are two ways in which it is strange. The first is the adaptive keyboard strip. Overall, it works surprisingly well and I think it is a clever idea. My sense is that the strip is more annoying in Windows because it changes out from under you all the time. In GNU/Linux, only manual changing of modes is supported. This, in my opinion, is a feature. I do miss the real feedback you get from pressing keys but for F-keys and volume-keys that I don t use often this isn t too important. On the downside, I have realized several times that I had been holding down a button for several seconds and not noticed. The more annoying issue with the keyboard is the way that the other keys have moved around. Getting rid of the CapsLock is wonderful! How has this taken so long? Replacing it with a split Home and End keys is nuts. I ve remapped the Home and End to put Control back where it should be. My right Control to now Home but I still don t have an End key. The split Backspace and Delete is not a problem for me. The tilde/apostrophe is in a very bad place. There is no Insert, Print Screen/SysRq, Scroll Lock, Pause/Break or NumLock. They are all just gone. Surprisingly, I haven t missed any of them. The second issue is the 2560 1440 resolution on the 14 inch screen. I use a 27 inch external monitor with the same native resolution laptop but, by my arithmetic, the pixel density on the laptop is 210 DPI instead 109 DPI on the external monitor. The result is the scaling problem and it s a huge pain that seems mostly unsolved on any operating system. Fonts and widgets that look good on the laptop look huge on my external monitor. Stuff that looks good on my external monitor looks minuscule on the laptop. I routinely move windows between my laptop screen and my large monitor. Until I find a display system that can handle this kind of scaling effectively, this requires changing font size and zooming all the time. At the moment, I m shrinking and expanding my font size using the built in hot keys in Emacs, Gnome Terminal, and Firefox/Iceweasel. I love the high resolution screen but the current situation is crazy-making. Finally, this setup will not get you into the Church of Emacs and it s not about to find its way onto the FSF s list of endorsed hardware. For one, I paid the Windows tax. Beyond that, there is the non-free BIOS and the need for non-free firmware to use the wireless and Bluetooth. This is standard for ThinkPads but it isn t getting any easier to swallow. There are alternatives in the form of Gluglug s X60 laptops running CoreBoot, Lemote Yeelong laptops, Bunnie Huang s Novena and others that are better in these regards. I am very excited for these projects but, for a number of reasons, these just weren t an option for the laptop I use for my research computing.

12 May 2014

Benjamin Mako Hill: Google Has Most of My Email Because It Has All of Yours

Republished by Slate. Translations available in French (Fran ais), Spanish (Espa ol), Chinese ( ) For almost 15 years, I have run my own email server which I use for all of my non-work correspondence. I do so to keep autonomy, control, and privacy over my email and so that no big company has copies of all of my personal email. A few years ago, I was surprised to find out that my friend Peter Eckersley a very privacy conscious person who is Technology Projects Director at the EFF used Gmail. I asked him why he would willingly give Google copies of all his email. Peter pointed out that if all of your friends use Gmail, Google has your email anyway. Any time I email somebody who uses Gmail and anytime they email me Google has that email. Since our conversation, I have often wondered just how much of my email Google really has. This weekend, I wrote a small program to go through all the email I have kept in my personal inbox since April 2004 (when Gmail was started) to find out. One challenge with answering the question is that many people, like Peter, use Gmail to read, compose, and send email but they configure Gmail to send email from a non-gmail.com From address. To catch these, my program looks through each message s headers that record which computers handled the message on its way to my server and to pick out messages that have traveled through google.com, gmail.com, or googlemail.com. Although I usually filter them, my personal mailbox contains emails sent through a number of mailing lists. Since these mailing lists often hide the true provenance of a message, I exclude all messages that are marked as coming from lists using the (usually invisible) Precedence header. The following graph shows the numbers of emails in my personal inbox each week in red and the subset from Google in blue. Because the number of emails I receive week-to-week tends to vary quite a bit, I ve included a LOESS smoother which shows a moving average over several weeks. Emails, total and from GMail, over timeFrom eyeballing the graph, the answer to seems to be that, although it varies, about a third of the email in my inbox comes from Google! Keep in mind that this is all of my personal email and includes automatic and computer generated mail from banks and retailers, etc. Although it is true that Google doesn t have these messages, it suggests that the proportion of my truly personal email that comes via Google is probably much higher. I would also like to know how much of the email I send goes to Google. I can do this by looking at emails in my inbox that I have replied to. This works if I am willing to assume that if I reply to an email sent from Google, it ends up back at Google. In some ways, doing this addresses the problem with the emails from retailers and banks since I am very unlikely to reply to those emails. In this sense, it also reflects a measure of more truly personal email. I ve broken down the proportions of emails I received that come from Google in the graph below for all email (top) and for emails I have replied to (bottom). In the graphs, the size of the dots represents the total number of emails counted to make that proportion. Once again, I ve included the LOESS moving average. Proportion of emails from GMail over timeThe answer is surprisingly large. Despite the fact that I spend hundreds of dollars a year and hours of work to host my own email server, Google has about half of my personal email! Last year, Google delivered 57% of the emails in my inbox that I replied to. They have delivered more than a third of all the email I ve replied to every year since 2006 and more than half since 2010. On the upside, there is some indication that the proportion is going down. So far this year, only 51% of the emails I ve replied to arrived from Google. The numbers are higher than I imagined and reflect somewhat depressing news. They show how it s complicated to think about privacy and autonomy for communication between parties. I m not sure what to do except encourage others to consider, in the wake of the Snowden revelations and everything else, whether you really want Google to have all your email. And half of mine. If you want to run the analysis on your own, you re welcome to the Python and R code I used to produce the numbers and graphs.

9 April 2014

Petter Reinholdtsen: S3QL, a locally mounted cloud file system - nice free software

For a while now, I have been looking for a sensible offsite backup solution for use at home. My requirements are simple, it must be cheap and locally encrypted (in other words, I keep the encryption keys, the storage provider do not have access to my private files). One idea me and my friends had many years ago, before the cloud storage providers showed up, was to use Google mail as storage, writing a Linux block device storing blocks as emails in the mail service provided by Google, and thus get heaps of free space. On top of this one can add encryption, RAID and volume management to have lots of (fairly slow, I admit that) cheap and encrypted storage. But I never found time to implement such system. But the last few weeks I have looked at a system called S3QL, a locally mounted network backed file system with the features I need. S3QL is a fuse file system with a local cache and cloud storage, handling several different storage providers, any with Amazon S3, Google Drive or OpenStack API. There are heaps of such storage providers. S3QL can also use a local directory as storage, which combined with sshfs allow for file storage on any ssh server. S3QL include support for encryption, compression, de-duplication, snapshots and immutable file systems, allowing me to mount the remote storage as a local mount point, look at and use the files as if they were local, while the content is stored in the cloud as well. This allow me to have a backup that should survive fire. The file system can not be shared between several machines at the same time, as only one can mount it at the time, but any machine with the encryption key and access to the storage service can mount it if it is unmounted. It is simple to use. I'm using it on Debian Wheezy, where the package is included already. So to get started, run apt-get install s3ql. Next, pick a storage provider. I ended up picking Greenqloud, after reading their nice recipe on how to use S3QL with their Amazon S3 service, because I trust the laws in Iceland more than those in USA when it come to keeping my personal data safe and private, and thus would rather spend money on a company in Iceland. Another nice recipe is available from the article S3QL Filesystem for HPC Storage by Jeff Layton in the HPC section of Admin magazine. When the provider is picked, figure out how to get the API key needed to connect to the storage API. With Greencloud, the key did not show up until I had added payment details to my account. Armed with the API access details, it is time to create the file system. First, create a new bucket in the cloud. This bucket is the file system storage area. I picked a bucket name reflecting the machine that was going to store data there, but any name will do. I'll refer to it as bucket-name below. In addition, one need the API login and password, and a locally created password. Store it all in ~root/.s3ql/authinfo2 like this:
[s3c]
storage-url: s3c://s.greenqloud.com:443/bucket-name
backend-login: API-login
backend-password: API-password
fs-passphrase: local-password
I create my local passphrase using pwget 50 or similar, but any sensible way to create a fairly random password should do it. Armed with these details, it is now time to run mkfs, entering the API details and password to create it:
# mkdir -m 700 /var/lib/s3ql-cache
# mkfs.s3ql --cachedir /var/lib/s3ql-cache --authfile /root/.s3ql/authinfo2 \
  --ssl s3c://s.greenqloud.com:443/bucket-name
Enter backend login: 
Enter backend password: 
Before using S3QL, make sure to read the user's guide, especially
the 'Important Rules to Avoid Loosing Data' section.
Enter encryption password: 
Confirm encryption password: 
Generating random encryption key...
Creating metadata tables...
Dumping metadata...
..objects..
..blocks..
..inodes..
..inode_blocks..
..symlink_targets..
..names..
..contents..
..ext_attributes..
Compressing and uploading metadata...
Wrote 0.00 MB of compressed metadata.
# 
The next step is mounting the file system to make the storage available.
# mount.s3ql --cachedir /var/lib/s3ql-cache --authfile /root/.s3ql/authinfo2 \
  --ssl --allow-root s3c://s.greenqloud.com:443/bucket-name /s3ql
Using 4 upload threads.
Downloading and decompressing metadata...
Reading metadata...
..objects..
..blocks..
..inodes..
..inode_blocks..
..symlink_targets..
..names..
..contents..
..ext_attributes..
Mounting filesystem...
# df -h /s3ql
Filesystem                              Size  Used Avail Use% Mounted on
s3c://s.greenqloud.com:443/bucket-name  1.0T     0  1.0T   0% /s3ql
#
The file system is now ready for use. I use rsync to store my backups in it, and as the metadata used by rsync is downloaded at mount time, no network traffic (and storage cost) is triggered by running rsync. To unmount, one should not use the normal umount command, as this will not flush the cache to the cloud storage, but instead running the umount.s3ql command like this:
# umount.s3ql /s3ql
# 
There is a fsck command available to check the file system and correct any problems detected. This can be used if the local server crashes while the file system is mounted, to reset the "already mounted" flag. This is what it look like when processing a working file system:
# fsck.s3ql --force --ssl s3c://s.greenqloud.com:443/bucket-name
Using cached metadata.
File system seems clean, checking anyway.
Checking DB integrity...
Creating temporary extra indices...
Checking lost+found...
Checking cached objects...
Checking names (refcounts)...
Checking contents (names)...
Checking contents (inodes)...
Checking contents (parent inodes)...
Checking objects (reference counts)...
Checking objects (backend)...
..processed 5000 objects so far..
..processed 10000 objects so far..
..processed 15000 objects so far..
Checking objects (sizes)...
Checking blocks (referenced objects)...
Checking blocks (refcounts)...
Checking inode-block mapping (blocks)...
Checking inode-block mapping (inodes)...
Checking inodes (refcounts)...
Checking inodes (sizes)...
Checking extended attributes (names)...
Checking extended attributes (inodes)...
Checking symlinks (inodes)...
Checking directory reachability...
Checking unix conventions...
Checking referential integrity...
Dropping temporary indices...
Backing up old metadata...
Dumping metadata...
..objects..
..blocks..
..inodes..
..inode_blocks..
..symlink_targets..
..names..
..contents..
..ext_attributes..
Compressing and uploading metadata...
Wrote 0.89 MB of compressed metadata.
# 
Thanks to the cache, working on files that fit in the cache is very quick, about the same speed as local file access. Uploading large amount of data is to me limited by the bandwidth out of and into my house. Uploading 685 MiB with a 100 MiB cache gave me 305 kiB/s, which is very close to my upload speed, and downloading the same Debian installation ISO gave me 610 kiB/s, close to my download speed. Both were measured using dd. So for me, the bottleneck is my network, not the file system code. I do not know what a good cache size would be, but suspect that the cache should e larger than your working set. I mentioned that only one machine can mount the file system at the time. If another machine try, it is told that the file system is busy:
# mount.s3ql --cachedir /var/lib/s3ql-cache --authfile /root/.s3ql/authinfo2 \
  --ssl --allow-root s3c://s.greenqloud.com:443/bucket-name /s3ql
Using 8 upload threads.
Backend reports that fs is still mounted elsewhere, aborting.
#
The file content is uploaded when the cache is full, while the metadata is uploaded once every 24 hour by default. To ensure the file system content is flushed to the cloud, one can either umount the file system, or ask S3QL to flush the cache and metadata using s3qlctrl:
# s3qlctrl upload-meta /s3ql
# s3qlctrl flushcache /s3ql
# 
If you are curious about how much space your data uses in the cloud, and how much compression and deduplication cut down on the storage usage, you can use s3qlstat on the mounted file system to get a report:
# s3qlstat /s3ql
Directory entries:    9141
Inodes:               9143
Data blocks:          8851
Total data size:      22049.38 MB
After de-duplication: 21955.46 MB (99.57% of total)
After compression:    21877.28 MB (99.22% of total, 99.64% of de-duplicated)
Database size:        2.39 MB (uncompressed)
(some values do not take into account not-yet-uploaded dirty blocks in cache)
#
I mentioned earlier that there are several possible suppliers of storage. I did not try to locate them all, but am aware of at least Greenqloud, Google Drive, Amazon S3 web serivces, Rackspace and Crowncloud. The latter even accept payment in Bitcoin. Pick one that suit your need. Some of them provide several GiB of free storage, but the prize models are quite different and you will have to figure out what suits you best. While researching this blog post, I had a look at research papers and posters discussing the S3QL file system. There are several, which told me that the file system is getting a critical check by the science community and increased my confidence in using it. One nice poster is titled "An Innovative Parallel Cloud Storage System using OpenStack s SwiftObject Store and Transformative Parallel I/O Approach" by Hsing-Bung Chen, Benjamin McClelland, David Sherrill, Alfred Torrez, Parks Fields and Pamela Smith. Please have a look. Given my problems with different file systems earlier, I decided to check out the mounted S3QL file system to see if it would be usable as a home directory (in other word, that it provided POSIX semantics when it come to locking and umask handling etc). Running my test code to check file system semantics, I was happy to discover that no error was found. So the file system can be used for home directories, if one chooses to do so. If you do not want a locally file system, and want something that work without the Linux fuse file system, I would like to mention the Tarsnap service, which also provide locally encrypted backup using a command line client. It have a nicer access control system, where one can split out read and write access, allowing some systems to write to the backup and others to only read from it. As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

19 March 2014

Russ Allbery: WebAuth 4.6.0

I was going to put out some of these changes in a 4.5.6 release late last year, but that didn't happen, and then more things kept coming up. So this release is rather large. The major new feature is a new WebAuthCookiePath directive for mod_webauth, which allows path-scoped WebAuth cookies so that different portions of a site can maintain separate authentication credentials. There are various caveats, and support will get better later, but it's a beginning. There are two bug fixes from Benjamin Coddington: WebAuthOptional should now work with Apache 2.4, and internal notes management in the module is now done better, which should prevent some cases where the user was redirected to WebLogin twice. Eventually, the things WebAuth uses notes for should become request context data, but that's for a later change. There are multiple changes to keyring handling to let mod_webauth and mod_webkdc work properly with the ITK Apache MPM, which allows each virtual host to run as a different user. Previously, all virtual hosts shared one in-memory keyring, which meant leaking authentication keys between virtual hosts. Now, each virtual host gets its own, lazily loaded from the keyring on disk when it's first needed. This allows ITK users to configure separate keyrings for each virtual host. To make this easier, keyring files are now locked for write, and writing a keyring preserves the ownership and permissions if possible. WebLogin now supports a new remctl-based password change protocol, which I developed for Stanford to work around some problems with the kpasswd when password change takes too long. All the tools for this will eventually be available outside of Stanford when I have a chance to polish them up and release them. There are a few other, more minor bug fixes. mod_webauth and WebLogin are now more aggressive about telling web browsers to really not cache pages. WebLogin also now uses the authenticated identity returned by the WebKDC for multifactor, since it may have canonicalized the user's identity. The correct template variable is now set when the user doesn't enter a code on the WebLogin multifactor page. Better error messages are returned for invalid principals and unknown realms. The workaround for invalid XML returned by the WebKDC should now actually work. And WebLogin logs a more detailed error message on password change failures. You can get the latest release from the official WebAuth distribution site or from my WebAuth distribution pages.

16 March 2014

Benjamin Mako Hill: Community Data Science Workshops in Seattle

Photo from the Boston Python Workshop   a similar workshop run in Boston that has inspired and provided a template for the CDSW.

Photo from the Boston Python Workshop a similar workshop run in Boston that has inspired and provided a template for the CDSW.

On three Saturdays in April and May, I will be helping run three day-long project-based workshops at the University of Washington in Seattle. The workshops are for anyone interested in learning how to use programming and data science tools to ask and answer questions about online communities like Wikipedia, Twitter, free and open source software, and civic media. The workshops are for people with no previous programming experience and the goal is to bring together researchers as well as participants and leaders in online communities. The workshops will all be free of charge and open to the public given availability of space. Our goal is that, after the three workshops, participants will be able to use data to produce numbers, hypothesis tests, tables, and graphical visualizations to answer questions like: If you are interested in participating, fill out our registration form here. The deadline to register is Wednesday March 26th. We will let participants know if we have room for them by Saturday March 29th. Space is limited and will depend on how many mentors we can recruit for the sessions. If you already have experience with Python, please consider helping out at the sessions as a mentor. Being a mentor will involve working with participants and talking them through the challenges they encounter in programming. No special preparation is required. If you re interested, send me an email.

8 March 2014

Benjamin Mako Hill: V-Day

My friend Noah mentioned the game VVVVVV. I was confused because I thought he was talking about the visual programming language vvvv. I went to Wikipedia to clear up my confusion but ended up on the article on VVVVV which is about the Latin phrase vi veri universum vivus vici meaning, by the power of truth, I, while living, have conquered the universe . There is no Wikipedia article on VVVVVVV. That would be ridiculous.

3 February 2014

Benjamin Mako Hill: Admiral Ackbar on Persian Governors

Title for a governor in ancient persia? Admiral Ackbar says: It's satrap!Q: The title for a governor in ancient Persia? A: It s satrap!

29 January 2014

Benjamin Mako Hill: Aaron Swartz A Year Later

My friend Aaron Swartz died a little more than a year ago. This time last year, I was spending much of my time speaking with journalists and reading what they were writing about Aaron. Since the anniversary of his death, I have tried to take time to remember Aaron. I ve returned to the things I wrote and the things I said including this short article published last year in Red Pepper that SJ Klein and I wrote together but that I forgot to mention on my blog. I m also excited to see that a documentary film about Aaron premiered at the Sundance Film Festival last week. I was interviewed for the film but am not in it. As I said last year at a memorial for Aaron, I think about Aaron frequently and often think about my own decisions in terms of what Aaron would have done. I continued to be optimistic about the potential for Aaron-inspired action.

Emanuele Rocca: Antifeatures and Antibugs

Software Engineering distinguishes between software features and software bugs. It is usually understood that features are positive, expected characteristics of a computer program. Features make users happy by allowing them to do something useful, interesting, or fun. Something good, anyways. Bugs are instead undesirable and annoying. You're sitting there at your computer writing a long email and the software crashes right before your email is sent. Bad stuff. Features are generally implemented by programmers on purpose, whereas bugs are purely unintentional. They are mistakes. You don't make a mistake on purpose. We might at this point be inclined to think that: i) what is good for users is done on purpose by software manufacturers; ii) what is bad for users was not meant to be. It happened by mistake. Here is a handy table to visualize this idea:
On purpose By mistake
Good Feature
Bad Bug
It seems to make a lot of sense. But you might have noticed that two cells of the table are empty. Right! In a great talk titled When Free Software isn't better, Benjamin Mako Hill mentions the concept of antifeatures, and how they relate to Free Software. Antifeatures are features that make the software do something users will hate. Something they will hate so much they would pay to have those features removed, if that's an option. Microsoft Windows 7 is used in the talk to provide some examples of software antifeatures: the Starter Edition does not allow users to change their background image. Also, it limits the amount of usable memory on the computer to 2GBs, regardless of how much memory the system actually has. Two antifeatures engineered to afflict users to the point that they will purchase a more expensive version of the software, if they have the means to do that. I have another nice example. The Spotify music streaming service plays advertisements between songs every now and then. To make sure users are annoyed as much as possible, Spotify automatically pauses an advertisement if it detects that the volume is being lowered. A poor Spotify user even tried to report the bug on The Spotify Community forum, only to find out that what she naively considered as a software error was "intentional behavior". A spectacular antifeature indeed. Whenever a piece of technology does something you most definitely do not want it to do, such as allowing the NSA to take complete control of your Apple iPhone, including turning on its microphone and camera against your will, that's an antifeature.
On purpose By mistake
Good Feature
Bad Antifeature Bug
Both bugs and antifeatures are bad for users. The difference between them is that antifeatures are engineered. Time and money are spent to make sure the goal is reached. A testing methodology is followed. "Are we really sure customers cannot change their wallpaper even if they try very very hard?" Engineering processes, of course, can fail. If the poor devils at Microsoft who implemented those harassments would have made a mistake that allows users to somehow change their wallpaper on Windows Starter... Well, I would call that a glorious antibug.
On purpose By mistake
Good Feature Antibug
Bad Antifeature Bug
There is no place for antifeatures in Free and Open Source Software. Free Software gives users control over what their software does. Imagine Mozilla adding a feature to Firefox that sets your speakers volume to 11 and starts playing a random song from the black metal artist Burzum every time you add a bookmark, unless you pay for Mozilla Firefox Premium Edition. The source code for Firefox is available under a free license. People who are not into Burzum's music would immediately remove this neat antifeature. I have spent many years of my life advocating Free and Open Source Software, perhaps using the wrong arguments. Mako's talk made me think about all this (thanks mate!). All these years I've been preaching about the technical superiority of Free Software, despite evidence of thousands of bugs and usability issues in the very programs I am using, and contributing to develop. Free Software is not better than Proprietary Software per se. Sometimes it is, sometimes it's not. But it gives you control, and freedom. When it annoys you, when it doesn't do what you expect and want, you can be sure it's not on purpose. And we can fix it together.

27 January 2014

Benjamin Mako Hill: My Geekhouse Bike Frame

In 2011, Mika and I bought in big at the Boston Red Bones party s charity raffle supporting MassBike and NEMBA and came out huge. I won $500 off a custom frame at Geekouse Bikes. For years, Mika and I have been planning to do the Tour d Afrique route (Capetown to Cairo), unsupported, on bike. People that do this type of ride sometimes use an expedition touring frame. I worked with Marty Walsh at Geekhouse to design a bike based on this idea. The concept was a rugged steel touring frame, built for my body and comfortable over long distances, with two quirks:
  1. It s designed for 26 inch mountain bike wheels and mountain bike components to ensure that the bike is repairable with parts from the kinds of cheap mountain bikes that can be found almost everywhere in the world.
  2. It includes S&S torque couplers that let me split the frame in half to travel with the bike as standard luggage.
As our pan-Africa trip kept getting pushed back, so did the need for the bike. Last week, I finally picked up the finished bike from Marty s shop in Boston. It is gorgeous. I absolutely love it. Picture of Geekhouse frame (1)Picture of Geekhouse frame (2)Picture of Geekhouse frame (4) Picture of Geekhouse frame (3) I m looking forward to building up the bicycle over the next couple months and I ll post more pictures when it s finished. I am blown away by Marty s craftsmanship and attention to detail. I am psyched that his donation made this bike possible and that I was able to get the frame while helping cycling in Massachusetts!

31 December 2013

Benjamin Mako Hill: When Free Software Isn t Better Talk

In late October, the FSF posted this video of a talk called When Free Software Isn t (Practically) Better that I gave at LibrePlanet earlier in the year. I noticed it was public when, out of the blue, I started getting both a bunch of positive feedback about the talk as well as many people pointing out that my slides (which were rather important) were not visible in the video! Finally, I ve managed to edit together a version that includes the slides and posted it online and on Youtube. The talk is very roughly based on this 2010 article and I argue that, despite our advocacy, free software isn t always (or even often) better in practical terms. The talk moves beyond the article and tries to be more constructive by pointing to a series of inherent practical benefits grounded in software freedom principles and practice. Most important to me though, the talk reflects my first serious attempt to bring together some of the findings from my day job as a social scientist with my work as a free software advocate. I present some nuggets from my own research and talk about about what they mean for free software and its advocates. In related news, it also seems worth noting that I m planning on being back at LibrePlanet this March and that the FSF annual fundraiser is currently going on.

14 November 2013

Benjamin Drung: Wanted: Most secure unencrypted email solution

Dear lazy web, Thanks to the global surveillance disclosures, I am searching for a secure email solution. Using end-to-end encryption seems to be the only secure solution to keep the email content private, but it does not protect your email header. End-to-end encryption has the big drawback that the communication partner has to use it, which is rarely the case. I want to communicate as secure as possible even with people that do not use end-to-end encryption. What is the most secure unencrypted email solution? Should I rent a (virtual) server in my country (Germany) and run my own email server on it? Do you know any reliable, inexpensive server host for such use case?

10 November 2013

Gregor Herrmann: RC bugs 2013/45

here's the list of RC bugs I've worked on during the last week.

5 November 2013

Benjamin Mako Hill: Settling in Seattle

Seattle from the airI defended my dissertation three months ago. Since then, it feels like everything has changed. I ve moved from Somerville to Seattle, moved from MIT to the University of Washington, and gone from being a graduate student to a professor. Mika and I have moved out of a multi-apartment cooperative into into a small apartment we re calling Extraordinary Least Squares. We ve gone from a broad and deep social network to (almost) starting from scratch in a new city. As things settle and I develop a little extra bandwidth, I am trying to take time to get connected to my community. If you re in Seattle and know me, drop me a line! If you re in Seattle but don t know me yet, do the same so we can fix that!

3 August 2013

Benjamin Mako Hill: Doctor of Philosophy

On Wednesday, I successfully defended my PhD dissertation in front of a ridiculously packed house at the MIT Media Lab. I am humbled by the support shown by the MIT Sloan, Media Lab, and Harvard communities. Earlier today, I finished up paperwork and submitted my archival copies. I m done. Although I ve often heard PhDs described as emotional roller coasters, I feel enormously blessed in that I honestly can t relate. My eight years at MIT and Harvard have been almost universally positive and I have learned and grown indescribably. As excited as I am about my next chapter at the University of Washington, I m going to miss my life here. Deeply. My dissertation was three essays on volunteer mobilization in peer production. Once I have a chance to catch up and recover, I ll be posting the previously unpublished pieces. The Remixing Dilemma was included in the dissertation and is already online. The Media Lab AV team shot professional video of the talk. When I get a copy of the video, I ll post that too. But because I think it s important, I ve formatted and published the acknowledgments section of the dissertation today. Although there are too many folks to thank, I ve highlighted the contributions of my co-authors, and friends, Aaron Shaw and Andr s Monroy Hern ndez and my almost unbelievably incredible group of advisors: Eric von Hippel, Yochai Benkler, Mitch Resnick, and Tom Malone.

21 July 2013

Benjamin Mako Hill: The Wikipedia Gender Gap Revisited

In a new paper, recently published in the open access journal PLOSONE, Aaron Shaw and I build on new research in survey methodology to describe a method for estimating bias in opt-in surveys of contributors to online communities. We use the technique to reevaluate the most widely cited estimate of the gender gap in Wikipedia. A series of studies have shown that Wikipedia s editor-base is overwhelmingly male. This extreme gender imbalance threatens to undermine Wikipedia s capacity to produce high quality information from a full range of perspectives. For example, many articles on topics of particular interest to women tend to be under-produced or of poor quality. Given the open and often anonymous nature of online communities, measuring contributor demographics is a challenge. Most demographic data on Wikipedia editors come from opt-in surveys where people respond to open, public invitations. Unfortunately, very few people answer these invitations. Results from opt-in surveys are unreliable because respondents are rarely representative of the community as a whole. The most widely-cited estimate from a large 2008 survey by the Wikimedia Foundation (WMF) and UN University in Maastrict (UNU-MERIT) suggested that only 13% of contributors were female. However, the very same survey suggested that less than 40% of Wikipedia s readers were female. We know, from several reliable sources, that Wikipedia s readership is evenly split by gender a sign of bias in the WMF/UNU-MERIT survey. In our paper, we combine data from a nationally representative survey of the US by the Pew Internet and American Life Project with the opt-in data from the 2008 WMF/UNU-MERIT survey to come up with revised estimates of the Wikipedia gender gap. The details of the estimation technique are in the paper, but the core steps are:
  1. We use the Pew dataset to provide baseline information about Wikipedia readers.
  2. We apply a statistical technique called propensity scoring to estimate the likelihood that a US adult Wikipedia reader would have volunteered to participate in the WMF/UNU-MERIT survey.
  3. We follow a process originally developed by Valliant and Dever to weight the WMF/UNU-MERIT survey to correct for estimated bias.
  4. We extend this weighting technique to Wikipedia editors in the WMF/UNU data to produce adjusted estimates of the demographics of their sample.
Using this method, we estimate that the proportion of female US adult editors was 27.5% higher than the original study reported (22.7%, versus 17.8%), and that the total proportion of female editors was 26.8% higher (16.1%, versus 12.7%). These findings are consistent with other work showing that opt-in surveys tend to undercount women. Overall, these results reinforce the basic substantive finding that women are vastly under-represented among Wikipedia editors. Beyond Wikipedia, our paper describes a method online communities can adopt to estimate contributor demographics using opt-in surveys, but that is more credible than relying entirely on opt-in data. Advertising-intelligence firms like ComScore and Quantcast provide demographic data on the readership of an enormous proportion of websites. With these sources, almost any community can use our method (and source code) to replicate a similar analysis by: (1) surveying a community s readers (or a random subset) with the same instrument used to survey contributors; (2) combining results for readers with reliable demographic data about the readership population from a credible source; (3) reweighting survey results using the method we describe. Although our new estimates will not help us us close the gender gap in Wikipedia or address its troubling implications, they give us a better picture of the problem. Additionally, our method offers an improved tool to build a clearer demographic picture of other online communities in general.

2 July 2013

Ondřej Čertík: My impressions from the SciPy 2013 conference

I have attended the SciPy 2013 conference in Austin, Texas. Here are my impressions.

Number one is the fact that the IPython notebook was used by pretty much everyone. I use it a lot myself, but I didn't realize how ubiquitous it has become. It is quickly becoming the standard now. The IPython notebook is using Markdown and in fact it is better than Rest. The way to remember the "[]()" syntax for links is that in regular text you put links into () parentheses, so you do the same in Markdown, and append [] for the text of the link. The other way to remember is that [] feel more serious and thus are used for the text of the link. I stressed several times to +Fernando Perez and +Brian Granger how awesome it would be to have interactive widgets in the notebook. Fortunately that was pretty much preaching to the choir, as that's one of the first things they plan to implement good foundations for and I just can't wait to use that.

It is now clear, that the IPython notebook is the way to store computations that I want to share with other people, or to use it as a "lab notebook" for myself, so that I can remember what exactly I did to obtain the results (for example how exactly I obtained some figures from raw data). In other words --- instead of having sets of scripts and manual bash commands that have to be executed in particular order to do what I want, just use IPython notebook and put everything in there.

Number two is that how big the conference has become since the last time I attended (couple years ago), yet it still has the friendly feeling. Unfortunately, I had to miss a lot of talks, due to scheduling conflicts (there were three parallel sessions), so I look forward to seeing them on video.

+Aaron Meurer and I have done the SymPy tutorial (see the link for videos and other tutorial materials). It's been nice to finally meet +Matthew Rocklin (very active SymPy contributor) in person. He also had an interesting presentation
about symbolic matrices + Lapack code generation. +Jason Moore presented PyDy.
It's been a great pleasure for us to invite +David Li (still a high school student) to attend the conference and give a presentation about his work on sympygamma.com and live.sympy.org.

It was nice to meet the Julia guys, +Jeff Bezanson and +Stefan Karpinski. I contributed the Fortran benchmarks on the Julia's website some time ago, but I had the feeling that a lot of them are quite artificial and not very meaningful. I think Jeff and Stefan confirmed my feeling. Julia seems to have quite interesting type system and multiple dispatch, that SymPy should learn from.

I met the VTK guys +Matthew McCormick and +Pat Marion. One of the keynotes was given by +Will Schroeder from Kitware about publishing. I remember him stressing to manage dependencies well as well as to use BSD like license (as opposed to viral licenses like GPL or LGPL). That opensource has pretty much won (i.e. it is now clear that that is the way to go).

I had great discussions with +Francesc Alted, +Andy Terrel, +Brett Murphy, +Jonathan Rocher, +Eric Jones, +Travis Oliphant, +Mark Wiebe, +Ilan Schnell, +St fan van der Walt, +David Cournapeau, +Anthony Scopatz, +Paul Ivanov, +Michael Droettboom, +Wes McKinney, +Jake Vanderplas, +Kurt Smith, +Aron Ahmadia, +Kyle Mandli, +Benjamin Root and others.


It's also been nice to have a chat with +Jason Vertrees and other guys from Schr dinger.

One other thing that I realized last week at the conference is that pretty much everyone agreed on the fact that NumPy should act as the default way to represent memory (no matter if the array was created in Fortran or other code) and allow manipulations on it. Faster libraries like Blaze or ODIN should then hook themselves up into NumPy using multiple dispatch. Also SymPy would then hook itself up so that it can be used with array operations natively. Currently SymPy does work with NumPy (see our tests for some examples what works), but the solution is a bit fragile (it is not possible to override NumPy behavior, but because NumPy supports general objects, we simply give it SymPy objects and things mostly work).

Similar to this, I would like to create multiple dispatch in SymPy core itself, so that other (faster) libraries for symbolic manipulation can hook themselves up, so that their own (faster) multiplication, expansion or series expansion would get called instead of the SymPy default one implemented in pure Python.

Other blog posts from the conference:

26 June 2013

Benjamin Mako Hill: Lookalikes

sacher_pde Is Franz Sacher, the Inventor of the famous sachertorte, still alive and and working at the at the Electronic Frontier Foundation? Might this help explain why EFF Technology Project Director Peter Eckersley is so concerned about protecting privacy and pseudonymity?

22 June 2013

Benjamin Mako Hill: Iceowl s Awesome New Icon

If you re a Debian user, you are probably already familiar with some of the awesome icons for IceWeasel (rebranded Mozilla Firefox), IceDove (rebranded Mozilla Thunderbird) and IceApe (rebranded Mozilla SeaMonkey).

iceweasel_icon-200pxicedove_icon-200px iceape_icon-200px I was pretty ambivalent about the decision to rebrand Firefox until I saw some of proposed the IceWeasel icons which in my humble opinion were just too cute, and awesome, to pass up.

iceweasel-old Until very recently however, IceOwl (rebranded Mozilla Sundbird) had no such awesome icon. Quite a while ago, I filed bug #658664 in Debian complaining that iceowl does not include awesome icy owl icons. I wrote:

I was extremely disappointed when I installed Iceowl and discovered that it does not ship with an awesome logo or icons showing a picture of an IceOwl. Instead, it seems to be represented by picture of a (boring) paper calendar which is very generic and not awesome at all. IceWeasel, IceDove, and IceApe each include extremely awesome logos/icons that have really cool looking white illustrations of icy weasels, doves, and apes. IceOwl needs a similarly awesome logo to use as its icon. This bug seems particularly egregious because owls actually live in icy climates and come in white versions! For example: https://commons.wikimedia.org/wiki/File:Snowy_Owl_-_Schnee-Eule.jpg While illustrators need to imagine what an ice ape or ice weasel might look like, there is no such need for imagination in the case of an ice owl! As far as I m concerned, this bug should be release critical. Hopefully, someone will upload a patch quickly!
Finally, after many months of all of us suffering in silence, Nick Morrott came along and fixed the bug with the creation of this new, incredibly awesome, icy owl logo!

iceowl_icon-350px

19 June 2013

Benjamin Mako Hill: Job Market Materials

Last year, I applied for academic, tenure track, jobs at several communication departments, information schools, and in HCI-focused computer science programs with a tradition of hiring social scientists. Being on the market as it is called is both scary and time consuming. Like me, many candidates have never been on the market before. Candidates are asked to produce documents in genres e.g., cover letters, research statements, teaching statements, diversity statements that most candidates have never written, read, or even heard of. Candidates often rely on their supervisors for advice. I did so and my advisors were extremely helpful. The reality, however, is that although candidates advisors may sit on hiring committees, most have not been on candidates side of job market themselves for years or even decades. The Internet is full of websites, like the academic jobs wiki, Academia StackExchange, and the Chronicle of Higher Education forums for people on the market. Confused and insecure candidates ask questions of the form, Does blank matter? and the answer is usually, Doing/having blank may help/hurt, but it is only one factor of many. The result is that candidates worry about everything. Then they worry about what they should be worrying about, but are not. The most helpful thing, for me, was to read and synthesize the material submitted by recent successful job market candidates. For example, Michael Bernstein a friend from MIT, now at Stanford published his research and teaching statements on his website and I found both useful as I prepared mine. That said, I was surprised by how little material like this I could find on the web. For example, I could not find any examples of recent job market cover letters from successful candidates in fields close to mine. So to help fill this gap, I am publishing all of my job market material. I ve posted both the PDFs of the material I submitted as well as the LaTeX templates I used to generate the documents in my packet. My packet included: I hope people going on the market will find these materials useful. Obviously, you should not copy or reuse the text of any of my material. It is your application, after all. That said, please do help yourself to the formatting and structure. Finally, I would encourage anyone who builds on my material to republish their own material to help other candidates. If you do, I d appreciate a link back or comment on this blog post so that my readers can find your improvements.

Next.

Previous.